CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

نویسندگان

  • Shuhua Wang
  • Bin Wang
  • Hao Lang
  • Xueqi Cheng
چکیده

This paper introduces our work in the TREC2005 SPAM track. Naïve Bayes and Littlestone’s Winnow are chosen as our basic classifiers. In our investigation, we found that when the structures of Ham and Spam are very different, the feature distributions of them vary a lot. Thus the factor of structure is introduced into our filter. Besides textual word feature, some kind of other features are also considered in our filter. Our experimental results show that Winnow outperforms Naïve Bayes and the multi-feature model outperforms structure based model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BUPT at TREC 2006: Spam Track

This report summarizes our participation in the TREC 2006 spam track, in which we consider the use of Bayesian models for the spam filtering task. Firstly, our anti-spam filter, Kidult, is briefly introduced. And then we try to use weighted adjustment of separating hyperplane and selective classifiers ensemble to improve the filtering performance. Finally, we summarize the relevant results from...

متن کامل

Spam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track

This paper summarizes our participation in the TREC 2005 spam track, in which we consider the use of adaptive statistical data compression models for the spam filtering task. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. We experimented with two different compression algorithms under varying model parameters. All four filters th...

متن کامل

DalTREC 2005 Spam Track: Spam Filtering Using N-gram-based Techniques

We briefly describe DalTREC 2005 Spam submission. DalTREC is the TREC research project at Dalhousie University. Four packages were submitted and they resulted in a median performance. The results are interesting and may be seen positive in the light of simplicity of our approaches.

متن کامل

PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection

Recently, the spam already constituted a serious problem for both e-mail users and Internet Service Providers (ISP). Solutions to the abuse of spam would be both technical and legal regulatory. This paper reports our solution for the TREC 2005 spam track, in which we consider the use of Naive Bayes spam filter for its desirable properties (simplicity, low time and memory requirements, etc.). Th...

متن کامل

Experiments in TREC 2007 Blog Opinion Task at CAS-ICT

This paper describes our participation in TREC 2007 Blog Track Tasks: Opinion retrieval and Polarity classification. As for Opinion retrieval task, a two-step approach is used to retrieve opinion relevant blog unit (that is blog post and its comments) given a query after filtering Spam blog and extracting blog unit. With Polarity Classification, Drag-push [1] based classifier is employed to get...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005